28 research outputs found
A Unifying Framework of Bilinear LSTMs
This paper presents a novel unifying framework of bilinear LSTMs that can
represent and utilize the nonlinear interaction of the input features present
in sequence datasets for achieving superior performance over a linear LSTM and
yet not incur more parameters to be learned. To realize this, our unifying
framework allows the expressivity of the linear vs. bilinear terms to be
balanced by correspondingly trading off between the hidden state vector size
vs. approximation quality of the weight matrix in the bilinear term so as to
optimize the performance of our bilinear LSTM, while not incurring more
parameters to be learned. We empirically evaluate the performance of our
bilinear LSTM in several language-based sequence learning tasks to demonstrate
its general applicability
Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients
Federated optimization, an emerging paradigm which finds wide real-world
applications such as federated learning, enables multiple clients (e.g., edge
devices) to collaboratively optimize a global function. The clients do not
share their local datasets and typically only share their local gradients.
However, the gradient information is not available in many applications of
federated optimization, which hence gives rise to the paradigm of federated
zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from
the limitations of query and communication inefficiency, which can be
attributed to (a) their reliance on a substantial number of function queries
for gradient estimation and (b) the significant disparity between their
realized local updates and the intended global updates. To this end, we (a)
introduce trajectory-informed gradient surrogates which is able to use the
history of function queries during optimization for accurate and
query-efficient gradient estimation, and (b) develop the technique of adaptive
gradient correction using these gradient surrogates to mitigate the
aforementioned disparity. Based on these, we propose the federated zeroth-order
optimization using trajectory-informed surrogate gradients (FZooS) algorithm
for query- and communication-efficient federated ZOO. Our FZooS achieves
theoretical improvements over the existing approaches, which is supported by
our real-world experiments such as federated black-box adversarial attack and
federated non-differentiable metric optimization
Nonmyopic 系-Bayes-Optimal Active Learning of Gaussian Processes
A fundamental issue in active learning of Gaussian processes is that of the exploration-exploitation trade-off. This paper presents a novel nonmyopic 系-Bayes-optimal active learning (系-BAL) approach that jointly and naturally optimizes the trade-off. In contrast, existing works have primarily developed myopic/greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on 系-BAL with performance guarantee and empirically demonstrate using synthetic and real-world datasets that, with limited budget, it outperforms the state-of-the-art algorithms.Singapore. National Research Foundation (Singapore-MIT Alliance for Research and Technology Center
Top- Ranking Bayesian Optimization
This paper presents a novel approach to top- ranking Bayesian optimization
(top- ranking BO) which is a practical and significant generalization of
preferential BO to handle top- ranking and tie/indifference observations. We
first design a surrogate model that is not only capable of catering to the
above observations, but is also supported by a classic random utility model.
Another equally important contribution is the introduction of the first
information-theoretic acquisition function in BO with preferential observation
called multinomial predictive entropy search (MPES) which is flexible in
handling these observations and optimized for all inputs of a query jointly.
MPES possesses superior performance compared with existing acquisition
functions that select the inputs of a query one at a time greedily. We
empirically evaluate the performance of MPES using several synthetic benchmark
functions, CIFAR- dataset, and SUSHI preference dataset.Comment: 35th AAAI Conference on Artificial Intelligence (AAAI 2021), Extended
version with derivations, 13 page
Hessian-Aware Bayesian Optimization for Decision Making Systems
Many approaches for optimizing decision making systems rely on gradient based
methods requiring informative feedback from the environment. However, in the
case where such feedback is sparse or uninformative, such approaches may result
in poor performance. Derivative-free approaches such as Bayesian Optimization
mitigate the dependency on the quality of gradient feedback, but are known to
scale poorly in the high-dimension setting of complex decision making systems.
This problem is exacerbated if the system requires interactions between several
actors cooperating to accomplish a shared goal. To address the dimensionality
challenge, we propose a compact multi-layered architecture modeling the
dynamics of actor interactions through the concept of role. Additionally, we
introduce Hessian-aware Bayesian Optimization to efficiently optimize the
multi-layered architecture parameterized by a large number of parameters.
Experimental results demonstrate that our method (HA-GP-UCB) works effectively
on several benchmarks under resource constraints and malformed feedback
settings.Comment: Included important citatio
A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models
Abstract This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets
Fair yet Asymptotically Equal Collaborative Learning
In collaborative learning with streaming data, nodes (e.g., organizations)
jointly and continuously learn a machine learning (ML) model by sharing the
latest model updates computed from their latest streaming data. For the more
resourceful nodes to be willing to share their model updates, they need to be
fairly incentivized. This paper explores an incentive design that guarantees
fairness so that nodes receive rewards commensurate to their contributions. Our
approach leverages an explore-then-exploit formulation to estimate the nodes'
contributions (i.e., exploration) for realizing our theoretically guaranteed
fair incentives (i.e., exploitation). However, we observe a "rich get richer"
phenomenon arising from the existing approaches to guarantee fairness and it
discourages the participation of the less resourceful nodes. To remedy this, we
additionally preserve asymptotic equality, i.e., less resourceful nodes achieve
equal performance eventually to the more resourceful/"rich" nodes. We
empirically demonstrate in two settings with real-world streaming data:
federated online incremental learning and federated reinforcement learning,
that our proposed approach outperforms existing baselines in fairness and
learning performance while remaining competitive in preserving equality.Comment: Accepted to 40th International Conference on Machine Learning (ICML
2023), 37 page
Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee
The growing literature of Federated Learning (FL) has recently inspired
Federated Reinforcement Learning (FRL) to encourage multiple agents to
federatively build a better decision-making policy without sharing raw
trajectories. Despite its promising applications, existing works on FRL fail to
I) provide theoretical analysis on its convergence, and II) account for random
system failures and adversarial attacks. Towards this end, we propose the first
FRL framework the convergence of which is guaranteed and tolerant to less than
half of the participating agents being random system failures or adversarial
attackers. We prove that the sample efficiency of the proposed framework is
guaranteed to improve with the number of agents and is able to account for such
potential failures or attacks. All theoretical results are empirically verified
on various RL benchmark tasks.Comment: Published at NeurIPS 2021. Extended version with proofs and
additional experimental details and results. New version changes: reduced
file size of figures; added a diagram illustrating the problem setting; added
link to code on GitHub; modified proof for Theorem 6 (highlighted in red
Batch Bayesian Optimization for Replicable Experimental Design
Many real-world experimental design problems (a) evaluate multiple
experimental conditions in parallel and (b) replicate each condition multiple
times due to large and heteroscedastic observation noise. Given a fixed total
budget, this naturally induces a trade-off between evaluating more unique
conditions while replicating each of them fewer times vs. evaluating fewer
unique conditions and replicating each more times. Moreover, in these problems,
practitioners may be risk-averse and hence prefer an input with both good
average performance and small variability. To tackle both challenges, we
propose the Batch Thompson Sampling for Replicable Experimental Design
(BTS-RED) framework, which encompasses three algorithms. Our BTS-RED-Known and
BTS-RED-Unknown algorithms, for, respectively, known and unknown noise
variance, choose the number of replications adaptively rather than
deterministically such that an input with a larger noise variance is replicated
more times. As a result, despite the noise heteroscedasticity, both algorithms
enjoy a theoretical guarantee and are asymptotically no-regret. Our
Mean-Var-BTS-RED algorithm aims at risk-averse optimization and is also
asymptotically no-regret. We also show the effectiveness of our algorithms in
two practical real-world applications: precision agriculture and AutoML.Comment: Accepted to NeurIPS 202